training data provenance

Query use case

Do we trust the providers/origin of all training data used - using whitelist

Schemas used

Pseudo code

FUNCTION ai_system_providers_trusted_with_whitelist(AI_System_ID, Whitelist_Emails)
    // Step 1: Retrieve provider UUIDs associated with the AI system
    SET Provider_UUIDs = get list of providers contributing data to AI_System_ID

    // Step 2: Retrieve provider email addresses
    SET Provider_Emails = map provider UUIDs to their identity email addresses

    // Step 3: Check if all provider emails are in the whitelist
    IF Provider_Emails is a subset of Whitelist_Emails THEN
        RETURN True
    ELSE
        RETURN False
END FUNCTION

Explanation

Find relevant data sources:
- Retrieve the configuration verification credential (ConfigVcId) for the AI system.
- Extract the weights verification credential (WeightsVcId) used in training.
- Ensure that the WeightsVcId is classified as "Weights".
- Trace back to the training system that produced these weights.
- Identify the datapack used in the training process.
Extract the list of Data Verification Credentials (DataVcIds) used in training from the datapack.
Determine the providers who contributed this data:
- For each DataVcId, check its attestations and extract provider UUIDs where the attestation type is "provided".
Map provider UUIDs to their email identities.
Check if all provider emails exist in the whitelist and return True only if every provider is trusted.

Query

ai_system_providers_trusted_with_whitelist(AiSystemId, Whitelist) link to query
link to simulator

Query use case​

Schemas used​

Pseudo code​

Explanation​

Query​

Query use case

Schemas used

Pseudo code

Explanation

Query